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Executive Summary 


Open Compute Project equipment that enlists immersion cooling may have some unique and 
specific requirements. The compute performance as well as the protection from overheating 
that is gained with immersion is generally worth the investment. Nearly all computing and 
communications equipment today is designed and manufactured for operation in air. 
Immersion cooling requires attention to several material and fluid handling specifications to 
ensure safe and reliable operation. This document provides Immersion guidelines and best 
practices from experts in thermal handling, fluid materials science and engineering, server 
integration, and power connectivity that OCP has brought together worldwide. 
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Abstract 


The following topics are covered within this whitepaper: 


Material Compatibility: Determining compatibility with immersion. Not all parts of the end- 
user supply chain will be tasked with ascertaining compatibility, i.e. CPU thermal interface 
material (TIM) is present between the chip die and the Integrated Heat Spreader (IHS) and is 
best researched and qualified by the original manufacturer. But a network extension cable can 
be practically validated and selected by a system integrator or end user. High level distinctions 
exist between the requirements for single-phase Hydrocarbons/Fluorocarbons and for two- 
phase Fluorocarbons. 


Thermal Design: Changes in thermal behavior commonly result from immersion. When IT 
equipment is optimized for immersion, more benefit can be gained. This section describes the 
potential impact and extent of new possibilities when thermal behavior under immersion is 
considered in designing devices and equipment. 


Mechanical Design: Certain fundamental changes can be expected when working with 
immersion technologies. Immersion rack enclosures are often quite different from traditional 
air racks. The shape, position and operation can be optimized to be different than air 
equipment. Vertical positioning of CPUs in an open bath immersion system and fully sealing an 
enclosed chassis are two examples. This results not only in a different IT design, but also in a 
different operating model (which is not fully covered in this white paper). 


Electronic Design: Density and layout are covered, and new design considerations for 
electronics when designing for fluid. Signal integrity, network connectivity, CPUs, storage 
devices and more. 


Software: BIOS, Firmware and IPMI features which should be implemented to allow effective 
operation within immersion solutions, without disqualifying the operation of the same 
equipment in air. 


Required reading: OCP ACS Immersion requirements document: 
https://www.opencompute.org/documents/ocp-acs-immersion-requirements-specification-1-pdf 
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1. Introduction 


This Open Compute Project (OCP) whitepaper is written for current and potential designers, 
manufacturers, integrators and end users of immersion-cooled OCP-ready equipment. OCP 
equipment that is air-designed can be retrofitted and supported as well, with the help of this 
reference guide. Integrators and component suppliers will find useful information specific to 
their preparation for immersion-cooled OCP systems. 


The nature of immersion cooling requires attention to all components and materials that will 
come into contact with dielectric fluids. The OCP Immersion workstream discusses 
considerations for cooling with dielectric fluids and the layout of IT equipment to 
accommodate thermal optimization and fluid compatibility. Immersion systems are supported 
through enclosed chassis for vertical Open Rack or tank-style integration and can support 
single or two-phase fluid cooling types. 


Immersing servers and information technology (IT) equipment in a dielectric fluid enables 
substantial energy savings and accommodates growing load densities. The existing 
proprietary immersion cooling solutions and numerous case studies have established the 
effectiveness and energy savings for new construction or a retrofit from the device to the 
facility level. 


Immersion cooling of data center equipment promises to improve reliability and overall 
equipment life, with lower service and repair costs. Immersion cooling greatly reduces failures 
such as solder joint failures, oxidation and corrosion of electrical contacts, electrostatic 
discharge, and ambient particulate. It allows for much more consistent and controlled 
operating temperature and humidity. Bill of materials (BOM) component count and cost 
reductions are intrinsic to immersion too, as fans, conventional heat sinks, and operating 
environment controls such as humidity sensors are eliminated. These reliability advances 
include a reduction in corrosion and electrochemical migration, lessening of environmental 
contamination like dust, debris, and particulates, reduced thermal shock, and mitigation of tin 
and zinc whiskers. Furthermore, the improved thermal management of components due to the 
increased heat capacity of fluids, can provide a significant increase in compute performance. 
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2. Material Compatibility 


The chemical and physical interactions between the dielectric fluid and various components 
introduce component lifecycle loads that differ from those observed in traditional air cooling 
and can result in degradation of material properties or component functionality. 
Understanding the mechanisms by which the hydrocarbon or fluorocarbon coolant and 
components interact, and mitigation options are important. The extent to which the coolant 
would be a contaminant vector is central to the study of reliability in immersion cooling. 


This chapter suggests experiments for predicting degradation of cables, printed circuit boards, 
packages, optical fibers and passive components. Dielectric coolant health is typically 
assessed by measuring shifts in composition or thermophysical properties under conditions 
representative of the end-use application. 


2.1 Components to Check for Compatibility & Material Source of Contaminant 


In an air-cooled data center environment, air quality is monitored and maintained to mitigate 
damage to critical infrastructure including the reliability of ITE. According to ASHRAE the 
potential reliability issues derive from particulates and corrosive contaminants. The ASHRAE 
published white paper ‘2011 Gaseous and Particulate Contamination Guidelines for Data 
Centers’ recommends that data center air quality is monitored and cleaned according to ISO 
14644-82. In the case of immersion-cooled systems, the electronics and supporting equipment 
themselves may act as a source of contaminants. In this sense, the designer has a control over 
contamination. 


All materials need to be validated for implementation in immersion systems. The following 
tables outline common examples to take into consideration for such validation. 
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Figure 1- Potential issues and a mitigation plan in ITE immersion design: Examples of undoing stickers 


Labels (stickers) 


After immersion 


Potential issue Effet | Mitigation 


Dissolvable glue/Adhesives and inks for | Information loss Document sticker information, cover 
all fluids. No recorded polluting with coolant resistant tape (acrylic), 
effects. use etched label. 


Figure 2- Potential issues and a mitigation plan in ITE immersion design: Examples of EPDM swelling in capacitors 


Capacitors 


Potential issue Effect Mitigation 


EPDM sealing may Interact with fluids. Swelling of EPDM sealing Use different capacitors which do not 
and bending of terminal contain EPDM. 
leads. 
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Figure 3- Potential issues and a mitigation plan in ITE immersion design 


Connectors, sockets, peripherals or sealants 


Potential issue Effect Mitigation 


Materials may have fluid compatibility Functionality may be Consider different material. 


issues. affected, voltage drop, 
disconnects. 


CMOS battery 


Potential issue Effect Mitigation 


Battery materials may have fluid Loss of CMOS information | Ensure compatible batteries. 
compatibility issues. and/or system functions. 


Figure 4- Potential issues and a mitigation plan in ITE immersion design: Example of exploded semiconductor due to 
failed relay 


Relays (PSU) 


Potential issue Effect Mitigation 


Unsealed relays may be slowed down Loss of function of affected | Select different PSU or use 
by fluid viscosity. circuits or overloaded immersion compatible relays. 


circuitry. 
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Figure 5- Potential issues and a mitigation plan in ITE immersion design: Example of disintegrated heat shrink 


Potential issue Effet = Mitigation 


May be intolerant to dielectric fluids, Materials may Replace with suitable materials. 
especially at higher temperatures disintegrate/react with fluid. 
(60°C+) Pollution of dielectric fluid. 


Figure 6- Potential issues and a mitigation plan in ITE immersion design: Example of Stiffened and disintegrated wire 
jacket 


Cables 


Potential issue Effet Mitigation 


Material compounds like plasticizers, Materials may Use compatible cabling. 
chlorine, sulphur etc. in the jackets may | disintegrate/react with fluid. 
dissolve into the dielectric fluid. Pollution of dielectric fluid. 
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Figure 7- Potential issues and a mitigation plan in ITE immersion design 


Thermal compounds and pastes 


Potential issue Effet = Mitigation 


May dissolve or be affected by Reduced thermal transfer Replace with compatible TIM (e.g. 
immersion capabilities of affected Indium foil), remove TIM or remove 
assembly. sink/spreader. 
Pollution of the dielectric 
fluid 


Figure 8- Potential issues and a mitigation plan in ITE immersion design: 


Potential issue Effect Mitigation 


Air design heat sink may not be suitable | Component is not Remove if not required; 
for immersion effectively cooled. Ignore if component remains within 
Different thermal required temperature limits; 
performance is expected. Replace with design optimized for 
immersion. 


Mechanical HDD’s 


Potential issue Effect Mitigation 


Fluid may penetrate through the air HDD malfunction due to Use hermetically sealed helium or 


vent. penetrated fluid causing solid-state drives. 
mechanical resistance to 
sensitive mechanical parts 
in HDD. 
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2.2 Material Compatibility: Single-Phase Immersion Cooling (Hydrocarbons) 


Traditional material compatibility tests like ASTM D471 and D2240 can be performed. In such 
tests, a material is typically soaked in the fluid for a period of time, often at elevated 
temperature (above expected operating temperatures), and changes in properties such as 
hardness, durometer and volume are recorded and compared with the results of similar tests 
in air. Some of the proven methods are explained below: 


Figure 9- Test conditions for material compatibility 


Test # 1: General material compatibility 


To understand material compatibility of | 1. Visual Inspection/cosmetic changes of the samples from the 
servers under operation. immersed server; 
Check the deposition of plasticizers, debris and contaminants 
on components and their effects on thermal performance; 
Cross sectioning and optical microscopic images to analyze 
electronic packaging structures; 
Thermal testing and analysis by tracking component 
temperature and performance over time; 
Mechanical testing and analysis of structural components, 
such as socket, retention, clips, etc.; 
Corrosion of electronic interconnects, solder materials and 
any exposed metallization including the chassis. 


Test # 2: Thermal aging of solder, PCBs, PVCs, Optical Fibers, SFP or QSFP, and Passive 
Components 


Experimental approach: Thermal aging/accelerated testing of immersed samples of PCBs, PVC jackets, 
passive components and optical fibers using an oven or environmental chamber. 


Understanding material compatibility 1. Mechanical, Thermal, and Electrical testing of aged samples; 
of components. 2. Structural analysis through Optical, X-ray tomography and 
SEM (scanning electron microscopy) analysis. 
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2.3 Material Compatibility: Single-Phase Immersion Cooling (Fluorochemicals) 


Since most organic polymers used in the fabrication of modern electronics are hydrocarbons 
in nature, fluorochemical fluids have virtually no affinity or solvency for them. Clean 
fluorochemical fluids, therefore show excellent compatibility by traditional material 
compatibility tests like ASTM D471, D2240, etc. In real world immersion cooling applications, 
the fluorochemicals will become contaminated with hydrocarbons such as dioctyl phthalate 
(DOP) that is extracted from polyvinyl chloride (PVC) wire insulation or silicone oils extracted 
from silicone polymers, solder flux, or thermal interface materials, for instance. Because the 
fluorochemical has very little solvency for these hydrocarbon contaminants, it is easily 
saturated with them (perhaps at concentrations of only a few parts per million). This also 
means that the fluorochemicals readily give up these hydrocarbon contaminants to other 
materials that have an affinity for them. One may therefore observe swelling of a hydrocarbon 
polymer as it absorbs material extracted from another hydrocarbon polymer. 


This is called a “secondary incompatibility” because the fluorochemical merely acts as a 
contaminant vector. Effects on the performance of either the source or sink of a contaminant 
are rare. For example, leaching out of plasticizers from CAT6 network cables will make the 
jacket stiffer but will not affect the functionality of the cable. In two-phase immersion, 
fluorochemical fluids can act as a vector in another way. 


2.4 Material Compatibility for Two-Phase Immersion Cooling (Fluorochemical 
Fluids) 


The boiling and condensation processes inherent to two-phase cooling have important 
implications for material compatibility and system health. During boiling, for example, the 
relatively non-volatile hydrocarbon contaminants dissolved in the fluid are deposited on 
boiling surfaces by distillation in much the same way that lime accumulates in a tea kettle with 
time. The vapor evolved by boiling the fluid, being freshly distilled, is free of hydrocarbon 
contaminants and once condensed, has a high affinity for them. If this condensate comes into 
contact with elastomers containing hydrocarbon contaminants, the fluid will extract or solvate 
these substances and upon returning to the boiling fluid, leave the oil behind. 


This mechanism for transporting relatively non-volatile contaminants from one part of the 
system to another is unique to two-phase systems and forms the basis for Soxhlet extraction, 
a technique by which a fluid is used to extract mobile compounds from a solid phase. It is used, 
for example, to extract essential oils from plants, or lipids from food samples to assess fat 
content. This quick and inexpensive lab test for assessing material compatibility in both single 
and two-phase applications should be done as explained below. 
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2.5 The Soxhlet Extraction Material Compatibility Test 


Details of the Soxhlet Extraction Material Compatibility Test can be found in various 
publications. What differentiates it from more common test methods, such as the ASTM soak 
tests mentioned earlier, is its ability to simultaneously measure both the mass of fluid 
absorbed by the polymer and the mass of the material that could be extracted from it. This 
makes the Soxhlet test more useful for assessing material compatibility in immersion cooling, 
particularly for two-phase systems. 


For single-phase applications, the boiling temperature of the fluid is typically hot enough (130C 
or higher) to “burn” most organic materials and it does not simulate well the end-use operating 
temperature. For those reasons, one typically does not run the Soxhlet extraction test with the 
actual working fluid for single-phase applications. 


Fluorochemical fluids (PFCs) with lower boiling temperatures can be substituted, in this case, 
to allow lower test temperatures. This practice is defensible because smaller PFC molecules 
are more able to get into a polymer (absorption) and are better solvents for extracting 
materials than their larger cousins. Their use therefore represents a “worst case.” 


2.6 Experiment Method 


The Soxhlet extraction compatibility test is intended to quantify 
compatibility by measuring the ability of the fluorinated fluid to extract 
relatively non-volatile materials such as oils from the sample and the ability 
of the sample to absorb the fluorinated fluid under atmospheric reflux 
conditions in a Soxhlet extractor. Its ability to separate extraction (me%) 
and absorption (ma%) makes the Soxhlet test more useful for assessing 
compatibility than conventional soak tests which provide only an overall 
mass change. 48-hour Soxhlet extraction test gives “worst case” results at 
fluid boiling point. 


2.7 Evaluation method 


a) weight loss (extracted); 


b) weight gain (swelling). 
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2.8 Meaning of the test results, extractables 


Extractable materials should be as low as possible in the sample. <2% loss by mass indicates 
good compatibility, too much mass loss can change the dimension, May also make the material 
brittle and crack; Too much material loss can cause problems elsewhere in the system. This 
guideline is based on the multiple criteria including it should not change physical material 
properties to an appreciable amount and loading and efficiency of the filtration media. 
(Source: 3M) 


This interpretation of the Soxhlet test results is a general recommendation to help the system 
engineer to design correctly. Please consult with supplier tech support for your specific 
application. It is recommended to have the proper filtration system to mitigate any 
extractables. 
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3. Thermal Design 


In Immersion, heat is transferred by a fluid flow or phase change of the fluid over the hot 
components. This fluid flow is generated by either forced convection, when the flow is 
generated by an external force such as a pump, or by natural convection, where density 
variations are used to generate the flow. Natural convection plays only a minor role in most 
sealed chassis single-phase immersion configurations but must be considered in open bath 
systems. The density of the dielectric fluid at high temperature is lower than the low 
temperature density, which causes the high temperature fluid to rise and generate a flow. A 
passive 2-phase immersion cooling system is one in which heat generating electronics are 
immersed in a bath of dielectric coolant that boils on the heat generating devices. The heat is 
captured efficiently as saturated vapor and can be transferred efficiently by condensation to 
an external heat sink like air or water. 


In single-phase immersion, to generate optimal and efficient cooling one maximizes the flow 
rate through the heat sink and over hot components with the strictest cooling requirements. 
The dielectric fluids are more viscous than air, which makes the generation of a turbulent flow 
more challenging. A turbulent flow is more efficient in removing heat than a laminar flow. 
However, there might be opportunities for enhancement by generating an unsteady versus a 
steady laminar flow. The unsteady flow may assist with breaking up the thermal boundary 
layer and therefore enhancing the heat transfer capabilities. Importantly, a flow is generated 
either by forced or natural convection in immersion but heat still needs to be extracted from 
the immersion solution. The extraction is accomplished through a heat exchanger or a 
condenser to ensure continued cooling. 


The type of boiling that occurs in a passive two-phase system is most often called saturated 
pool boiling because it occurs within a pool of fluid uniformly heated to the fluid’s saturation 
or boiling temperature with saturated vapor above that fluid. Direct submersion of a bare die 
or lidded package is rarely an optimal way to do two-phase immersion. Various techniques can 
be used to enhance boiling heat transfer: extended surfaces like metallic or graphite fins or 
foams function primarily to spread heat to a lower heat flux thereby reducing the wetted 
surface superheat, porous organic coatings and porous metallic coating (preferred today). 


3.1 Heat transfer optimization 


Whether a component is installed in a single- or two-phase solution, high power components 
such as a GPU or CPU will run much more effectively at lower component temperatures. Heat 
transfer enhancement is necessary around these vital devices. 
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With immersion cooling, a dielectric fluid is in contact with the entire IT gear and its printed 
circuit board, and this fluid creates a thermal pathway for cooling of all components. 
Therefore, many low power components that may require a heat sink in air cooling, can be 
cooled in immersion without a heat sink. 


The importance of ensuring a thermal pathway is highlighted in the example of full fluid cooling 
using cold plates, where no air cooling is present [5]. For immersion cooling, it is the opposite. 
Indirectly-cooled components have a threshold value higher than for air cooling, since fluids 
are a more efficient heat transfer medium than air. Low power components such as Voltage 
Regulators (VRs), chipsets, Baseboard Management Controllers (BMCs) and embedded GPUs 
may not require heat sinks in immersion. 


That said, the threshold power limit where thermal solution is not needed because of 
immersion should be evaluated on a case-to-case basis. 


3.2 Single-phase heat transfer optimization 


Note: The following paragraph states many properties of fluids. Unless specified, room 
temperatures may be assumed. 


Common single-phase dielectric fluids have specific heat properties ranging between 1300 
J/kgK (Fluorocarbon) and 2300 J/kgK (Hydrocarbon), while air has a specific heat value of 1000 
J/kgK. For reference, water has a specific heat value of ~4180 J/kgK. 


As the specific heat metric suggests (J/kg*K), the specific heat relates to the weight (kg) of the 
fluid. This is where the density of the fluid is of importance. 


The density of the dielectric fluids is much higher than air. Therefore, combining heat capacity 
with density provides an insight in how fluid cooling affects heat sink designs. The relevant 
information here is the amount of energy which can be absorbed by a certain volume of the 
fluid. It should be noted that 1 Watt equals 1 Joule per second (1 Wh=3600J). The result of this 
is a much smaller surface area requirement for heat transfer within fluid as compared to air. 


The following table describes the different heat capacities of the main dielectric fluid groups 
and how the heat capacity relates to its ability to absorb thermal energy per liter. Note the near 
identical thermal capability of Hydrocarbons and Fluorocarbons in this comparison. This is 
explained by the higher density of Fluorocarbons which compensates for the lower heat 
capacity, which is related to mass instead of volume. 
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Figure 10- Comparison of heat capacities amongst fluid groups 


Medium type Specific heat Volume/kg Joules/litre 


Water (reference only) 4182 J/kgK 4182 J/L 


Hydrocarbon 2300 J/kgk 1.24L 1854.8 J/L 
Fluorocarbon 1300 J/kgK 0.71L 1831.0 J/L 
Air 1000 J/kgk 773.46 L 1.3 J/L 


Notes to this table: 


- Even though water is not a suitable medium for immersion cooling, it is included in the table to provide a frame 
of reference. 

- All specific heat numbers of used cooling mediums are rounded and generic to prevent specific fluid 
references. 


3.3 Flow rate differences 


Immersion in dielectric fluids can allow for greatly reduced heat transfer surface areas as the 
fluids are able to transfer heat much more effectively than air. 


It is however not simply a matter of stating that the surface area can be reduced with a factor 
of 1400 (1831/1.3) as this would require identical flow rates to air cooling. The higher heat 
transfer capability of the dielectric fluids compared to air means that the flowrate can and will 
be greatly reduced. Thus, the thermal designer must balance the reduced flow rate with the 
required heat transfer surface area. 


To properly design a heat sink for single phase immersion, a multitude of parameters should 
be considered for each type of dielectric fluid. In some cases, the target operational 
temperature may play a role in heat sink design as the fluid may show significant property 
changes when temperatures are changed (e.g. density, viscosity, specific heat, etc.) 


3.4 Single-Phase heatsink design parameters 


The first element which should be considered is the thermal resistance for the heat sink design 
to determine effectiveness of the thermal solution. If efficient cooling is obtained with the 
initial design, no redesign is needed. However, if an optimized solution is preferred, 
modifications to the heat sink design might be required. 
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Compared to traditionally used heat sinks for air cooling, immersion heat sinks will benefit 
from a larger fin pitch due to the increased viscosity of immersion fluids compared to air. The 
pitch is the distance between two separate fins. The pitch is mostly affected by the viscosity of 
the dielectric fluid which is used. The viscosity indicates the ease with which a fluid will flow. 


The fin specification is a very important aspect of heat sink design in fluid. Because of the high 
heat capacity of the dielectric fluid, heat is transported away from the fins effectively. The use 
of measurements and CFD (Computational Fluid Dynamics) for analysis is recommended. To 
maximize the effectiveness of the heat sink surface area, the thermal energy needs to be able 
to travel through the fins. Sufficient fin thickness should therefore be considered. Combined 
with the fin thickness, other fin properties like height and length can usually be drastically 
reduced to allow significant space optimization within the chassis. 


The base of the heat sink is just as important as the fin specification as the base is responsible 
for distributing all thermal energy to the fins. In some cases, a solid metal plate may be enough 
for heat spreading while in other configurations the heat sink may require heat pipes or vapor 
chambers to effectively spread the heat in the base across the fins. 
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Some examples of fluid optimized heat sinks for open bath systems can be seen in the following 
images: 


Figure 11- Comparison of heat sink designs 


Heat sink examples 


ao) a 


—_= 


by Asperitas 
Fluid heat sink test setup for Intel® Exaggerated application Application specific fluid optimized 
Xeon® Scalable processors. specific fluid heat sink for | heat sink for dual GPU sandwich. 
AMD EPYC™. 
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Heat sink designs for sealed chassis configurations can take the form of a cold plate if forced 
convection is used to direct dielectric fluid to the hottest components. The following image is 
an example of how a single-phase sealed chassis system can work: 


Figure 12- Cold plate in immersion 


Sealed chassis immersion cold plate 


by LiquidCool Solutions 


Enclosed server cross section illustrating fluid flow | All components including processors, PSU, memory 

through directed flow cold plate. and storage are immersed in the dielectric fluid, but 
the entering fluid is directed to the processors first 
before being vented to the dielectric in the 
surrounding enclosure. 
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3.5 Two-phase heat transfer optimization 


For optimal performance in two-phase immersion, any device (CPUs, GPUs, some ASICs, 
FPGAs, etc.) that would require a copper heat sink in an air-cooled and/or a single-phase 
immersion-cooled environment should have a boiler assembly applied to it. Typically devices 
(Surface mount voltage regulators, thru hole MOSFETs, diodes, some ASICs) that would have 
an extruded aluminum heat sink in an air-cooled environment do not require boiling 
enhancement. There is merit to providing boiling enhancement of some kind (Paint on organic 
BECs) for higher power (~30W) bare die devices. 


3.6 Two-Phase Boiler Assembly 


Heat spreader typically copper for its high thermal conductivity. Area and thickness dictated 
by heat spreading requirements. 


Boiling Enhancement Coating (BEC) typically, porous copper as described below and applied 
to a heat spreader. The thermal performance of higher power devices such as CPUs and GPUs 
are improved using boiling enhancement coatings (BECs). BECs can take various forms but are 
typically micro porous copper coatings 100-500 micron thick. BECs can provide up to a 15x 
increase in boiling heat transfer coefficient versus a smooth surface. BECs are often applied to 
a copper heat spreader to create a BOILER that can be applied to the CPU or GPU with various 
thermal interface materials or without a retention mechanism. 


Retention Mechanism applies force to the boiler to ensure a good thermal and/or electronic 
(socket) interface. Retention plate is typically made of aluminum or steel. May or may not be 
bonded to Boiler. May include springs, screws, etc. to apply force to the Boiler. 


Thermal Interface Materials (TIMs) are used to reduce the thermal resistance between two 
components, such as CPU/GPU and heat sink/BEC. There are many different types of TIMs 
available, and in this document they are divided into two main groups: solid TIMs and non- 
solid TIMs. Solid TIMs can be made out of metal, while the non-solid TIMs can be for example 
thermal greases. It is important to choose the TIM material carefully in immersion. One 
consideration is to ensure that the TIM selected reduces the thermal resistance sufficiently to 
meet the thermal requirement of the component being cooled (in the same way as done for air 
and cold plate cooling). In immersion, it is also essential to ensure material compatibility 
between any materials used in immersion and the immersion fluid, also for the TIM. In many 
immersion applications, solid TIM such as Indium foil is used and preferred. 
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Figure 13- Boiler plate assembly 


Boiler assembly terminology 


Fluid — 
[ Boiling Enhancement 
Coating (BEC) 


< 


Boiling Enhancement —~_] A Heat Spreader m 


Coating (BEC) TM? 


Heat Spreader f lid 


M= | 


Silicon Die ———— Silicon Die ———— 


Package 
A 


Substrate Substrate 


Left image 


Terminology for boiler assemblies applied to bare Terminology for boiler assemblies applied to lidded 
die. devices. 


Figure 14- Boiler plate assembly in situ 


Boiler assembly example 


a. BEC soldered to HIS; 
b. BEC boiler applied to lidded GPU with thermal grease; 
c. BEC boilers applied to 750W ASICs with Indium TIM. 
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Figure 15- Boiler retention plate assembly 


Retention plate examples 


Exploded view of boiler and retention plate applied | Boiler assembly for Intel Skylake including plated 
to lidded and bare die devices with a thermal aluminum retention plate, screws, etc.. 
interface. 


3.7 Component placement for Single Phase fluids 


Just like in air design systems, the layout of all electronics should be considered in accordance 
with their heat dissipation (TDP in Watts) and maximum temperature tolerance requirements 
(X°C) in relation to the fluid flow direction. 


Components which generate more heat and/or require lower operating temperatures should 
be placed upstream or in the coldest/lower part of the tank. Components with a high tolerance 
for heat can be placed downstream or in the highest parts of the tank. The middle area can be 
filled with all remaining components, while considering any thermal component constraints. 


The thermodynamic properties within the chassis are not only dependent on the thermal 
production within the chassis or tank, but also on total thermal production within the rest of 
the immersion system and the properties of the cooling supply. This means that IT systems can 
influence each other. 
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The flow direction in combination with component placement should also be considered 
carefully. Thermal shadow effects may impact the desired operation of components, while in 
other situations may be desirable to allow flow optimization. 


TO: Bottom section/cold is where the dielectric fluid temperature is predictable and has a direct 
relation with the FWS cooling temperature. Since the FWS is used to cool the dielectric fluid by 
means of a heat exchanging device, there is always some kind of temperature difference or 
Delta (AT) between the dielectric fluid and the FWS. Since this AT differs between immersion 
technologies, it is important to understand this value if systems are designed for specific FWS 
supply temperatures. 


This is also the area which is usually considered for GPU’s (high TDP), PSU’s (Temperature 
tolerance) or SSD’s (Temperature tolerance). 


T1: Center/moderate is the area where the dielectric fluid is pre-heated by the most sensitive 
or highest performing components. The environment temperature in this position is usually 
higher due to this pre-heating. The actual temperature depends on the components within the 
bottom section. 


This area is commonly used for CPU’s and all components which are integrated or attached to 
the mainboard. 


T2: Top/Warm the top of the system should have the highest temperature tolerance, as this is 
the area where hotspots may be encountered. Any heat which is generated by any component 
in the chassis will move upwards naturally. This means that this area should only be populated 
with components which are suitable for operation in the highest possible temperatures for the 
designed IT assembly. 


This area is usually populated with PSU’s (ease of access), SSD’s (temperature tolerance/low 
TDP) and interface cards (temperature tolerance/low TDP). 
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Figure 16- Stratification of immersion temperature zones 


Immersion temperature zones 


Non-immersed area 
~ T2, somewhat dependant on ambient 
Dry interface connections 


T2 Warm 
TO + 7-18°C 
PDU / SSD / Interface cards 


T1, Moderate 
TO + 3-13°C 
CPU / GPU / Interface cards 


TO, Cold 
FWS (cooling) input + delta T 
GPU / PSU / SSD 


Open Compute Project: Open Cassette Specification 


Temperature stratification of immersion bath system example to consider IT equipment component 


placement. 


3.8 Component placement for Two Phase fluids 


Two-phase immersion cooling provides an isothermal environment in the tank, therefore 
thermal shadowing is not relevant, at least not at the densities reached today. In two-phase 
immersion, there is no significant role for stratification. Instead, consideration must be given 
to the spacing of components in the upper fluid Area to allow an escape path for the vapor 
produced by the phase changes of the components housed at the lower fluid levels. The total 
free-space throughout this area is determined by the amount of fluid which phase has 
changed, as a result of the power used by the operating components in the lower regions, and 
the difference in enthalpy of the fluid vs. the vapor phase of the fluid used. In general 
components which generate more heat and/or require lower operating temperatures could be 
placed at the bottom of the fluid. 


Page 27 


Open Compute Project Design Guidelines for Immersion-Cooled IT Equipment 


4. Mechanical Design 


4.1 IT chassis dimensions 


Single-phase sealed chassis servers can be sized to fit into standard OCP racks. For open bath 
systems the chassis design depends on the dimensions of the tank that is being used. To be 
consistent with air-cooled servers, the 19-inch and 21-inch server widths will be addressed in 
this section. Other chassis widths are possible and will have their advantages and 
disadvantages and can be used as necessary. 


Since the 19-inch form factor is still largely used in data centers and the 21-inch form factor is 
being deployed at an increasing rate, both are adaptable to immersion without requiring a 
huge redesign effort. 


The dimensions of the chassis should allow for easy installation of components while keeping 
the highest possible density. 


The length of the chassis is related to the depth of the tank. The length of the chassis should 
allow for easy extraction/removal from the tank. 


Considering the different compute requirements from users, immersion cooling is a good 
solution for high density IT equipment. To keep the height of the chassis as small as possible, 
components of lower height can be considered. The height of the DIMMs can also be a limiting 
factor in system U-height. Thus, options for reduced DIMM height such as angled connectors or 
short DIMM may be desirable. 


In open bath systems the servers are immersed in a dielectric fluid, and a minimum space 
between servers (e.g. 0.5-mm) is recommended to facilitate extraction and avoid the adhesive 
force effect. 
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4.2 Chassis consideration for immersion 


For vertically-oriented tanks (open bath), these features on the chassis need to be considered 
in order to improve installation, handling and the serviceability: 


c) At least one handle or other hoisting features to assist when pulling out the chassis 
vertically, preferably that may be used both by hand or an assisted lifting device; 

d) Support vertical pulling forces by for example reinforcing the front or rear of the chassis, 
where the handles are fixed; 

e) Guiding features should be considered to help control the lowering of the chassis into the 
tank and to keep its designated position; 

f) Features that allow for the reduction/elimination of unwanted movement e.g. shake 
effect once installed in the tank; 

g) In vertical orientation of the chassis, fixation features for add-on components installed 
(e.g. PCI cards, extension boards etc.); 

h) Component orientation that will not block thermal flow. For example, the components 
that are fitted with a heat sink should be placed with the heat sink fins parallel to the 
thermal fluid flow; 

i) If mounting ears are designed, these should be removable to allow positioning in vertical 
tanks. 


The compute unit can be powered on by power cable or busbar: 


a) Cable access from the top of the tanks is highly recommended to allow an operator to 
unplug all cables before servicing the chassis. The features for cable management, such 
as cable trough, cable duct, are recommended to optimize the cable routing; 

b) Ifin a busbar implementation, the blind-mate busbar clip must be floating and requires 
guiding features to facilitate the alignment tolerance between the chassis and the busbar. 


Concerning the mass of a chassis, the maximum load for operator servicing should be 
considered to decide the maximum weight of the server. In order to meet data center handling 
requirements, the total mass of a server with all the components installed are preferred to not 
exceed 34 kg (75 lbs.). Even more, if the server is more than 18 kg (40 lbs.) it is recommended 
to be handled by two people [7]. In order to have a better understanding of the total mass of a 
server that can be safely handled, considering the immersion environment constraints (e.g. the 
hydrocarbon-based fluids are known to be slippery), further research or information from the 
industry is required. 
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4.3 Power Supply 


Most open frame AC/DC power supply units (PSUs) can be suitable. A regular server PSU may 
require modification to allow long-term operation without fans and with sufficient fluid flow. 
Because PSUs often have a built-in thermal shutdown feature based on an air-cooled scenario, 
they may initiate a power-off at too low temperatures in immersion systems. Modifications in 
either software or hardware may be required to optimize or turn this feature off. A PSU 
Backplane can be included into the chassis design to allow for a redundant power setup. 


In single phase or 2-phase applications, dedicated PSU and shared power (power shelf 
implementation) are feasible, depending on the tank configuration. The tank could be fitted 
with busbars or power cables for the power distribution. Reducing the I/Os through the tank 
should be considered to achieve better sealing of the tank lid. 


In-chassis placement: 


PSUs installed in the chassis (e.g. 19” standard build) must be fully immersed. For optimized 
cooling in single phase systems the PSU should be installed in the lower part of the chassis, 
near the bottom of the tank to avoid preheat effects. Installing the PSU in the upper part of the 
chassis is also an option, with the added advantage of offering accessibility to the PSU for 
replacing or maintenance purposes. 


Specifically, in single phase systems, when considering the PSU location in the chassis, one 
should consider the preheat impacts on components in the downstream flow path. For 
example, flows exiting CPUs, GPUs, or other high-power components will be preheated. These 
higher temperature flows may create cooling challenges for components in the downstream 
flow path. 


Power shelf configuration: 


PSUs located in a power shelf offer the ability to share the energy requirement, among a 
specified number of nodes, via bus bars or power cables (e.g. OCP solution). 


One of the advantages of this solution is that, by removing the PSU from within the chassis the 
extra space can be used for additional IT gear or for reduced space for fluid savings. 


The power shelf should be installed in a way that allows for easy access and removal of the 
power modules. For example, in a vertical orientation, with the power modules accessible from 
the top of the tank. 
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If the tank is fitted with busbars, special attention should be given in positioning them out of 
reach of the operator and to protect them from possible falling debris. 


For both options, dedicated PSU and power shelf configuration, it is important that the fans 
are disable (e.g. via software control), unplugged or physically removed in order to avoid the 
alteration of the predefined fluid flow. 


4.4 Storage 


Direct immersion of storage devices is limited to SSD/NVMe (chip) and sealed helium drives. 
Storage can be included by using brackets for mounting storage devices into the chassis. While 
acoustic waves may be a consideration for spinning drives in air environments where fans are 
present, early data from hard drive liquid immersion studies have shown a substantial 
improvement in tracking and random performance capability. However, the acoustic wave 
effects in high throughput systems with both compute and storage components have yet to be 
evaluated for potential benefits. Nevertheless, acoustic wave considerations should be a 
design factor in the development of immersion systems [20]. 


4.5 High speed (optical) network cabling 


High speed or high-performance network cabling based on copper can normally be used within 
the chassis. 


When an optical cable interface is immersed in dielectric fluid, the air at the interface (between 
ferrules) may be replaced by the fluid. This interface change results in signal reflection loss due 
to the change in refractive index (RI) at the optical interfaces, which should be considered 
during the design process of an IT solution. 


There are several commonly applied solutions to deal with this constraint: 


1. Use of cables with direct attached connectors like SFP or QSFPs which: 
a. Contain no air gap. These cables could be soldered or glued to the transceiver which 
eliminates the possibility of signal reflection loss; 
b. contain sealant to prevent fluid penetration into the air gap; 
c. Connectors using silicon photonics have no air gap. 
2. Use port extenders to allow optical connectivity outside of the fluid environment. 
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5. Electronic Design Guidelines 


5.1 Reference for Signal Integrity (SI) Verification 


Signal integrity of a circuit or system should be validated when using immersion technology. A 
thorough and rigorous analysis is the only way to validate that a successful outcome is likely. 
To help validate the signal integrity in immersion cooling, the measurement of an eye diagram 
should be part of the test plan and designed to meet the high speed I/O specification. An eye 
diagram is acommon indicator of the signal quality of high-speed digital transmissions. 


Note: This is an eye mask for PCle CEM form factor. Different platform form factor may have different testing eye mask 
requirement. They should all follow the base specification requirement and end to end link margin requirement. Some 
examples are the PCle CEM specification [17], PCle base specification [18], and OCP NIC 3.0 specification [19] 


Check the S parameter to ensure any signal insertion loss and differential impedance is within the 
specification. 

Check the eye diagram to ensure the signal integrity is within the specification. 

The experiment should follow the following steps: (Use PCle CEM test item for example). It is recommended 
that the test set-up be validated prior to immersion. 


To test the signal integrity (S Assemble the Device Under Test (DUT) and connect the test fixture. 
parameter) of the electrical design Turn on and set the oscilloscope to default settings. 
while functional in a dielectric fluid. Turn on the DUT. Load the pattern generator tool and generate 
the test pattern signal. 
Capture the signal and save it as binary file (.bin) by using the 


oscilloscope. 

Test the saved waveform by using software SigTest. 
Check the test result and make sure the result is within the 
specification. 


PCle host and card compliance test procedure can be found at https://pcisig.com/developers/compliance-program 
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5.2 Signal Integrity Verification Design Consideration for Immersion 


Dielectric constant Dk shall be considered along with other channel performances to meet 
socket and connector impedance requirement for High-speed 1/0 interfaces. 


The impact of immersion to the PCB transmission line needs to be evaluated and that includes 
several different considerations. 


The impact to microstrip insertion loss needs to be considered. Generally, the insertion loss 
increases slightly increases, with no significant impact to microstrip loss. However, the 
designer can minimize high-speed signal routings in microstrip from the beginning of the 
design. The full channel simulation analysis should be performed when the total channel loss 
is at the design edge of server platform design guidance. The following figure shows 


measurements of PCB transmission line insertion loss. 


SI measurement examples 
2 r; 


By Intel 


PCB transmission line insertion loss measurement Socket and package substrate measurement diagram 
when immersed in fluid. when immersed in fluid. 


The impact to microstrip impedance needs to be considered. Microstrip impedance can be 
reduced due to the property of immersed media versus air. It is recommended to consider this 
when specifying PCB impedance for manufacturing. The chemical properties of the immersion 
fluid used needs to be evaluated to determine impact. The impact to microstrip crosstalk 
needs to be considered. Far-end crosstalk effect is typically reduced, while a near-end crosstalk 
effect is not obviously observed. 
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Generally, there is no significant impact to strip line loss, impedance, and crosstalk. 


The connector and socket typically are designed with air as surrounding medium. When 
immersed in fluid, the design target impedance will likely be changed. New models with 
surrounding air replaced with fluid should be created, and performance impact needs to be 
understood. 


There is a potential for the mismatch of capacitive impedance and conductive impedance of 
the socket to be balanced by the fluid. In general, this effect is not expected to significantly 
impact the channel performance. For the future socket design, the analysis should be 
performed to check the immersion impact to SI performance. 


Most connectors tuned for air use may fail due to the capacitive impedance mismatch 
increased and the inductive impedance mismatch reduced when connector is immersed in the 
fluid. It is recommended to evaluate the fluid impact during the design. 


Cables also pose a performance challenge. High speed cables for Ethernet, PCIE, etc. have 
stringent performance specifications. Small changes may have big performance impact. For 
example, there is a potential risk for fluid to be wicked up the cable sheathing changing its 
performance. Cables should be tested for long term electrical performance and reliability. 


PCB Microstrip Less loss, may be dueto | Will be lower in Lower FEXT; Little NEXT 
lower humidity content reference to air impact 


Package and socket No significant impact No significant impact FEXT from socket slightly 
increase; No significant 
NEXT impact 


Connector More loss at high Significantly lower No significant FEXT and 


frequency, due to more impedance peak; NEXT impact, due to 
reflection Significantly more impedance delta, 
return loss considerations should be 
made for reflections 
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As fluids age there are potential that the material properties will change, hence changing the 
SI performance. Potential changes that may occur are chemical property changes, external 
environment contamination, internal component material washout contamination etc. 
Contamination from internal or external sources can create electrical performance issues and 
can all contribute to signaling performance changes. Proper cleaning of components for initial 
deployment, filtration system, and continuous monitoring of the fluid properties should be 
considered. It is recommended to periodically check fluid electrical property to ensure they are 
within specified range. 


5.3 Adjustable temperature settings 


Many systems are equipped with a thermal sensor which is used to determine the 
environmental temperature of the IT equipment. The firmware monitors this sensor and 
determines whether it is safe to switch on. 


Since immersion can work with much higher temperatures compared to air and the solutions 
vary in thermal effectiveness and tolerances, this temperature threshold should be a 
configurable item for immersion solution vendors, IT integrators, or end users. 


Alternatively, an option to disable the platform thermal protection should be considered which 
allows fallback to integrated thermal management of chips. 


Many components are monitored for their thermal status during operation. When 
implementing IT equipment in immersion, these tolerances may be impacted. Some sensors 
may be set to higher tolerances and others should remain unchanged. 


An alternate set of temperature thresholds should be available to integrators and end users 
when implementing IT in immersion to facilitate optimized thermal monitoring. 


5.4 Fan control and detection 


Most electronics are initially designed for air and manage the airflow in the system. Combined 
with airflow management, there are often safety features to prevent server activation without 
fans present or while fans are disabled or defective. 


Since fans are not supported in most, if not all immersion strategies, any airflow management 
should be disabled. Any safety controls related to airflow like fan detection must be disabled 
and this should not trigger any alerts. 
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5.5 System performance 


Immersion allows more efficient cooling compared to air. For this reason, an IT system has the 
potential to operate on higher performance for longer time periods. The firmware should allow 
for continuous operation in turbo mode or even allow overclocking. Settings for this should be 
facilitated to at least immersion solution vendors and IT integrators. 


5.6 Management reporting (IPMI) 


The management port of a server allows access to remote control features and status reporting 
of the server system. The telemetry which is generated by air designed systems is based on 
basic system information and its condition in an air environment. 


5.7 — Immersion support in firmware 


For business economics reasons, IT equipment will most likely be designed for air and 
compatibility with immersion. For this reason, the firmware may include a set of thresholds 
and settings which are optimized for immersion. Examples of methods allowing to switch 
between air and immersion are: 


a) BIOS switch for immersion. The default mode may then be air, but when immersion is 
selected, all thresholds, safety features and performance settings are optimized for 
immersion. This means that immersion becomes part of the standard firmware of every 
system. Further specifications could be considered to differentiate between specific 
immersion solutions (brand/type) or immersion categories (single phase vs 2-phase) 

b) Custom firmware. Custom firmware may be further optimized for immersion and could 
also be made specifically for a specific immersion solution or a range of solutions. This 
custom firmware solution should be made available to Fluid solution vendors and IT 
integrators. 
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Appendix: Common practices dealing for retro-fitting air-designed IT 


equipment 


1. Disconnect any fan and, if necessary because firmware can’t operate without it, connect a 
fan emulator to supply the required pulses and pretend a correct fan is functioning. 

2. Replace thermal paste with either a type compatible with immersion or another TIM 
suitable for immersion (e.g. Indium Foil). 

3. Check whether PSUs contain solenoids or relays which might not be operable while 
submerged. 

4. Ensure no spinning drives are immersed except the helium sealed type. SSDs do not suffer 
from incompatibility issues due to their absence of moving components. 

5. If an off-the-shelf chassis is used, ensure the selection of a model whose brackets (“ears”) 
are on the side which will be exposed at the top of the tank, or a model for which a different 
holding, securing and hoisting mechanism can be easily retrofitted. 

6. Any system that will be immersed should be thoroughly cleaned from dust particles and 
other contaminants which might pollute the dielectric fluid. 

7. For open bath configurations, whenever possible use servers with all available ports and 
connections on the “top” side of the server, ports at the “bottom” of the tank will not be 
easily accessible and, if cables are required to be connected there, the cabling will be 
complicated and potentially unsafe. 

8. Cable routing is essential. The designer must be careful to ensure that the cables have 
enough length and space to pull out the chassis if they cannot be removed for servicing. 

9. For testing purposes and reverse engineering, fan simulators may be used. Please refer to 
the OCP Fan Sim spec for more details. 


Fan Sim specification: https://www.opencompute.org/documents/open-compute-specification-fan-sim-spec-2-pdf 
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Appendix: Glossary 


BBU Battery Backup Unit 


BEC Boiling Enhancement Coating: surface microstructure enhancing coating to improve 
heat transfer properties 


CRAC Computing Room Air Conditioner 

DC Data Center 

DUT Device Under Test 

EPDM Ethylene Propylene Diene Monomer: a type of synthetic rubber 
HSC Hot-swap Controller 


IPMI Intelligent Platform Management Interface: A set of computer interface specifications 
for an autonomous computer subsystem. 


PCB Printed Circuit Board 

PCBA Printed Circuit Board Assembly 
PDU Power Distribution Unit 

PSU Power Supply Unit 

PUE Power Usage Effectiveness 
PSU Power Supply Unit 


QSFP Quad (4-channel) Small-form Factor Pluggable: Acommon optical component type for 
data center servers and other computing and communications equipment. 


TDP Thermal Design Power value describing the thermal limits of a component or computer 
system 


TIM Thermal Interface Material: any material inserted between two parts to enhance the 
thermal coupling 


VR Voltage Regulator 
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About Open Compute Project 


The Open Compute Project Foundation is a 501(c)(6) organization which was founded in 2011 
by Facebook, Intel, and Rackspace. Our mission is to apply the benefits of open source to 
hardware and rapidly increase the pace of innovation in, near and around the data center and 
beyond. The Open Compute Project (OCP) is a collaborative community focused on 
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